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OPERATIONAL  CONSTRAINTS  ON  THE  UTILITY  OF 
VIRTUAL  AUDIO  CUEING 


Mark  A.  Ericson,  Robert  S.  Bolia,  W.  Todd  Nelson,  and  Richard  L.  McKinley 
Air  Force  Research  Laboratory,  Wright-Patterson  AFB,  OH  45433 

The  potential  of  virtual  audio  display  technology  to  provide  operators  with  veridical  spatial  cues  may  be 
substantially  constrained  by  factors  that  are  common  in  many  operational  settings  -  i.e.,  high  noise  level, 
limitations  in  the  bandwidth  of  the  audio  source  and/or  display.  The  purpose  of  this  study  was  to  examine 
•  the  effects  of  varying  the  bandwidth  of  a  virtual  sound  source  in  the  presence  of  broadband  noise  in  a 
reverberant  environment.  Specifically,  the  signal-to-noise  ratio  (SNR)  was  varied  from  +50  dB  to  -10  dB, 
and  the  signal  was  low  pass  filtered  at  1.6  kHz,  4  kHz,  8  kHz,  and  15  kHz.  Correlational  analyses 
comparing  actual  and  perceived  sound  source  location  revealed  that  both  signal  bandwidth  and  signal-to- 
noise  ratio  influenced  auditory  localization  acuity,  and  that  even  under  optimal  bandwidth  and  noise 
conditions  (15  kHz  and  +50  dB)  localization  in  elevation  was  extremely  poor.  These  findings  have 
numerous  implications  for  the  design  of  spatial  audio  displays,  especially  those  meant  to  be  used  in  noisy 
environments. 


INTRODUCTION 

King  and  Oldfield  (1997)  systematically  varied  the 
bandwidth  of  signals  to  determine  the  minimal  frequency 
composition  of  a  signal,  which  provides  optimal  localization 
acuity.  Signals  with  low  pass  cutoff  frequencies  of  13  kHz 
enabled  listeners  to  best  determine  the  azimuth  and  elevation 
of  signals  and  reduce  the  number  of  front  to  back  reversals. 
Signals  with  a  high  pass  cutoff  frequency  of  9  kHz  minimized 
the  front  to  back  reversal  rate  in  azimuth.  Ideally,  such  wide 
bandwidth  signals  should  be  used  when  possible  to  optimize 
performance  with  directional  audio  displays.  While  King  and 
Oldfield’s  results  have  important  implications  for  the  design  of 
effective  spatial  audio  displays,  their  empirical  study  did  not 
address  the  potential  effects  of  noise  and/or  the  interaction  of 
noise  and  reduced  bandwidth  on  virtual  audio  cueing. 

Most  free-field  binaural  masking  experiments  have  used  a 
single  directional  masker  and  a  single  directional  signal  in  an 
anechoic  environment  (Gilkey  and  Good,  1996).  The  presence 
of  a  directional  masker  tends  to  "push"  or  "pull"  the  perceived 
location  of  the  signal.  A  single  masker  is  the  least  complex  of 
all  possible  masking  conditions.  At  the  other  extreme  of 
masking  complexity  is  the  case  of  an  infinite  number  of 
maskers  presented  all  around  a  listener.  A  reverberation 
chamber  can  approximate  a  spatially  diffuse  masking  condition 
over  a  wide  range  of  frequencies,  typically  from  100  Hz  to  8 
kHz.  Hirsh  (1950)  measured  auditory  localization  acuity  of 
human  listeners  in  highly  reverberant  environments.  The 
directional  signals  and  masker(s)  were  located  together  in  the 
reverberant  environment.  In  general,  such  signals  are  more 
difficult  to  localize  and  are  more  easily  masked  than  in  free- 
field  listening  conditions.  Most  real-world  listening 
environments  fall  between  anechoic  and  highly  reverberant 
conditions. 

The  advent  of  virtual  audio  technology  provides  as  many 
opportunity  to  present  directional  signals  over  headphones 
while  a  listener  is  immersed  in  some  ambient  noise 
environment.  The  purpose  of  present  study  was  to  assess  the 
effects  of  ambient  masking  noise,  as  found  in  many  operational 


environments,  on  a  listener’s  ability  to  identify  the  direction  of 
a  virtual  sound  source  presented  over  headphones. 
Additionally,  these  data  are  meant  to  be  compared  to  empirical 
results  reported  in  the  literature,  as  summarized  in  Table  1 . 

Method 

Participants.  Three  male  and  three  female  participated  in 
these  experiments.  Participants  had  normal  hearing  threshold 
levels,  localization  acuity  within  30  degrees  precision. 
Participants  were  paid  for  their  participation. 

Experimental  Design.  Two  experiments  were  conducted 
with  the  same  design.  The  first  study  used  a  300  ms  noise 
stimulus  with  no  head  tracking.  The  second  experiment  used  a 
continuous  noise  source  with  head  tracking.  The  following 
description  applies  to  both  experiments.  Six  listeners 
participated  in  this  within  subjects,  factorial  design.  Five 
signal-to-noise-ratios,  (-50,  10,  0,  -5,  -10  dB),  were  employed 
and  four  low  pass  filter  cutoff  frequencies  at  1.6,  4,  8  and  15 
kHz  values  were  employed.  A  total  of  37  locations  were  used. 
Twenty-four  of  the  37  target  angles  were  equally  distributed, 
eight  each,  along  the  median,  frontal  and  transverse  planes. 
Five  orthogonal  vertexes  of  front,  back  left,  right,  and  zenith 
were  chosen.  The  eight  remaining  locations  were 
symmetrically  distributed  at  the  locations  of  ±  45°  azimuth  and 
±  37°  elevation.  The  volunteer  listeners  were  randomly 
assigned  to  one  of  six  blocks  to  reduce  order  effects  due  to 
possible  learning  of  the  task.  Each  listener  responded  to  5 
signal  to  noise  ratios  X  4  cutoff  frequencies  X  37  angles  X  5 
repetitions  =  3,700  data  points. 

Apparatus.  All  experiments  were  conducted  in  facilities  of 
the  Air  Force  Research  Laboratory’s  Aural  Displays  and 
Bioacoustics  Branch  at  Wright-Patterson  Air  Force  Base.  The 
8000  ft3  reverberation  chamber  and  sound  system  of  the  voice 
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Table  1 .  Summary  of  existing  empirical  localization  data. 
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communications  research  evaluation  system  (VOCRES) 
facility  were  used  to  generate  an  ambient  noise  field  at  95  dBA 

*  SPL.  Two  General  Radio  1382  random  noise  generators 
produced  the  non-correlated  ambient  masking  and  directional 
target  noises.  The  masker  noise  was  played  over  the  VOCRES 

♦  sound  system.  The  directional  target  noise  was  played  through 
a  Hewlett-Packard  low  pass  filter  set,  Wilsonics  programmable 
attenuators,  directionally  encoded  either  by  a  Tucker-Davis 
Technologies  (TDT)  Digital  Signal  Processing  System 
(Experiment  1)  or  by  an  Air  Force  Research  Laboratory'  3-D 
Auditory  Display  Generator  (3-D  ADG;  see  McKinley, 

Ericson  and  D’Angelo,  1994  for  detailed  description) 
(Experiment  2),  and  presented  to  listeners  over  Sennheiser  HD 
560-11  headphones. 

Head  motion  cues  were  provided  via  a  Fast-Trak  electro¬ 
magnetic  tracking  system  coupled  to  the  3-D  ADG.  The  Fast- 
Trak  measured  the  orientation  of  the  listeners’  head  120  times 
per  second  with  8  ms  latencies  to  enable  a  space-stabilized 
auditory  image  of  the  target  noise.  Listeners  reported  the 
perceived  direction  of  the  target  noise  using  the  GELP  (Gilkey 
et  al.,  1995)  spherical  pointing  technique.  The  GELP 
technique  employs  a  hand  held  stylus  to  record  azimuth  and 
elevation  responses  automatically  via  the  Fast-Trak  system. 

Procedure.  The  participants  were  seated  inside  the 
VOCRES  reverberation  chamber  and  provided  with  the 
headphones,  head  tracking  sensor  and  stylus.  Listeners  were 
instructed  to  keep  their  head  level  and  facing  towards  the  front 
before  stimulus  presentation.  For  both  experiments,  head 
motion  was  unrestricted  while  pointing  with  the  stylus.  After 
responding,  listeners  were  instructed  to  return  to  the  starting 
position.  Each  session  included  185  stimulus  presentations 
and  lasted  approximately  twenty  minutes. 

Results 

Results  from  both  experiments  are  displayed  in  Figures  1 
and  2.  In  Figure  1,  response  azimuth  is  plotted  as  a  function  of 
target  azimuth  for  the  extreme  SNR  and  bandwidth  conditions 
in  each  experiment.  Figures  1  a  and  lb  represent  data  collected 
under  conditions  of  a  +50  dB  SNR  and  a  15  kHz  bandwidth, 
while  Figures  lc  and  Id  represent  data  collected  under 
conditions  of  a  -10  dB  SNR  and  a  1.6  kHz  bandwidth.  In 
Figure  2,  response  elevation  is  plotted  as  a  function  of  target 
elevation  for  the  same  conditions.  Pearson  product  moment 
correlation  coefficients  were  computed  for  all  eight  conditions, 
and  are  displayed  in  each  of  the  eight  scatter  plots. 

It  is  evident  in  Figures  1  a)  and  c)  -  broadband  signal,  no  noise 
-  that  perceived  and  actual  sound  source  locations  were  highly 
correlated,  with  the  exception  of  sound  sources  that  were 

'  located  on  the  median  plane  (i.e.,  0°  and  180°).  This  latter 
result  can  be  explained  by  noting  that  the  data  illustrated  in 
these  figures  were  NOT  corrected  for  front-back  confusions. 

In  contrast,  Figures  1  b)  and  d)  demonstrate  the  deleterious 
effect  of  limited  bandwidth  and  noise  on  localization  in 
azimuth.  Localization  in  elevation,  on  the  other  hand,  was 
inferior  in  all  conditions,  as  evidenced  by  Figures  2  a-d. 
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Figure  1 .  Response  azimuth  as  a  function  of  target  azimuth 
(Figs  la-b  represent  data  from  Experiment  1 ;  Figs  lc-d 
represent  data  from  Experiment  2). 
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Figure  2.  Response  elevation  as  a  function  of  target  elevation 
(Figs  2a-b  represent  data  from  Experiment  1;  Figs  2c-d 
represent  data  from  Experiment  2). 


CONCLUSIONS 

The  results  of  the  experiments  described  herein  clearly 
demonstrate  the  efficacy  of  existing  virtual  audio  displays  for 
the  presentation  of  veridical  cues  to  the  locations  of  sound 
sources  in  the  horizontal  plane.  What  is  equally  clear  is  that 
this  is  not  the  case  in  the  vertical  plane.  Indeed,  the 
distribution  of  responses  and  the  range  of  correlation 
coefficients  (.31  <  r  <  .007)  indicates  that  participants 
localized  poorly  in  elevation  even  under  optimal  noise  and 
bandwidth  conditions.  Given  that  the  displays  employed  in 
these  investigations  represent  the  state  of  the  art  in  spatial 
audio  technology,  it  makes  sense  for  designers  to  consider, 
pending  further  technological  developments,  constraining  their 
displays  such  that  only  the  azimuth  is  cued. 
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